Aiding Web Searches by Statistical Classification Tools
نویسندگان
چکیده
We describe an infrastructure for the collection and management of large amounts of text, and discuss the possibility of information extraction and visualisation from text corpora with statistical methods. The paper gives an overview of processing steps, the contents of our text databases as well as different query facilities. Our focus is on the extraction and visualisation of collocations and their usage for aiding web searches.
منابع مشابه
TCDB: the Transporter Classification Database for membrane transport protein analyses and information
The Transporter Classification Database (TCDB) is a web accessible, curated, relational database containing sequence, classification, structural, functional and evolutionary information about transport systems from a variety of living organisms. TCDB is a curated repository for factual information compiled from >10,000 references, encompassing approximately 3000 representative transporters and ...
متن کاملGFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining
Statistical and clustering analyses of gene expression results from high-density microarray experiments produce lists of hundreds of genes regulated differentially, or with particular expression profiles, in the conditions under study. Independent of the microarray platforms and analysis methods used, these lists must be biologically interpreted to gain a better knowledge of the patho-physiolog...
متن کاملThe Visualization of Evolving Searches
It is a common misconception that all web searches can be answered with a single query. It is true that when users have a clear idea of what they are searching for, they can specify an accurate and efficient query to the search engine and find pertinent results in the first 10 search results returned. However, studies of search engine usage by Jansen et al. ( [56],[57], [59]) show that, on aver...
متن کاملFamiliarity with and Use of Web 2.0 Tools in Library Services by Librarians Working at Iran, Tehran, and Shahid Beheshti Universities of Medical Sciences
Background and Aim: Web 2.0 technology has various usages in libraries all over the world. According to studies, however, it seems that this technology is rarely used in Iranian academic libraries. Therefore, the present study aims to determine the level of familiarity with and use of Web 2.0 tools among librarians working at Iran, Tehran, and Shahid Beheshti Universities of Medical Sciences. ...
متن کاملUsing unlabeled data to improve classification in the naive bayes approach: Application to web searches
This paper introduces a method to build a classifier based on labeled and unlabeled data. We set up the EM algorithm steps for the particular case of the naive Bayes approach and show empirical work for the restricted web page database. Original contributions includes the application of the EM algorithm to simulated data in order to see the behavior of the algorithm for different numbers of lab...
متن کامل